Self-contained Entity Discovery from Captioned Videos
نویسندگان
چکیده
This article introduces the task of visual named entity discovery in videos without need for task-specific supervision or external knowledge sources. Assigning specific names to entities (e.g., faces, scenes, objects) video frames is a long-standing challenge. Commonly, this problem addressed as supervised learning objective by manually annotating with labels. To bypass annotation burden setup, several works have investigated utilizing sources such movie databases. While effective, approaches do not work when are provided and can only be applied movies TV series. In work, we take step further propose discover from corresponding captions subtitles. We introduce three-stage method where (i) create bipartite entity-name graphs frame–caption pairs, (ii) find agreements, (iii) refine assignment through entity-level prototype construction. tackle new problem, outline two benchmarks, SC-Friends SC-BBT , based on Friends Big Bang Theory Experiments benchmarks demonstrate ability our approach which belongs face scene, an accuracy close oracle, just multimodal information present videos. Additionally, qualitative examples show potential challenges self-contained any future work. The code data available GitHub. 1
منابع مشابه
Indexed Captioned Searchable Videos: A Learning Companion for STEM Coursework
Videos of classroom lectures have proven to be a popular and versatile learning resource. A key shortcoming of the lecture video format is accessing the content of interest hidden in a video. This work meets this challenge with an advanced video framework featuring topical indexing, search, and captioning (ICS videos). Standard optical character recognition (OCR) technology was enhanced with im...
متن کاملEye movements while viewing narrated, captioned, and silent videos.
Videos are often accompanied by narration delivered either by an audio stream or by captions, yet little is known about saccadic patterns while viewing narrated video displays. Eye movements were recorded while viewing video clips with (a) audio narration, (b) captions, (c) no narration, or (d) concurrent captions and audio. A surprisingly large proportion of time (>40%) was spent reading capti...
متن کاملRecognizing Situation Patterns from Self-Contained Stories*
We propose extracting information about characters and actions from a self-contained story, such as news reports. This information is stored in structure patterns called situations. We show how these situation patterns can be constructed by unifying the constituents of sentence analysis with knowledge previously stored in Typed Feature Structures. These situations can be in turn used subsequent...
متن کاملSelf-contained CLI Assemblies
High-level programming languages and bytecode-based virtual execution environments have become popular in software development. Bytecode-based runtimes extend embedded system by techniques to improve safety, help portability and interoperability. The ECMA/ISO Common Language Infrastructure (CLI) specifies a bytecodebased execution environment (Common Language Runtime) and a comprehensive class ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM Transactions on Multimedia Computing, Communications, and Applications
سال: 2023
ISSN: ['1551-6857', '1551-6865']
DOI: https://doi.org/10.1145/3583138